home *** CD-ROM | disk | FTP | other *** search
- GREPSMC.LBR
-
- The chief content of this library is a translation of a public domain
- version of the Unix (TM) utility grep into Small C a la Hendrix. This
- implementation allows most of the standard control constructs (if, else,
- while, do ... while, for, switch, goto, but not expr1 ? expr2 : expr3 ),
- only int and char variables plus 1 level of indirection and a single
- subscript (char *c; and char d[10]; are allowed, but not char **c; char *d[10];
- or char e[10,3];).
-
- Grep is a program primarily desiged for printing out lines in files
- matching (containing) a specified "regular expression". A particular case of
- a regular expression is just a fully specified string of characters such
- as "#define". Thus
-
- grep #define grepsmc.c
-
- will list all lines in grepsmc.c containing "#define". But regular
- expressions can specify more complicated patterns using "meta-characters"
- such as '^', '$', '.', '*', '+', '-', '\' and '[...]'. For example '^' and
- '$' match the beginning and the end of a line, respectively. Hence
-
- grep ^#define grepsmc.c
-
- matches "#define" only if it starts in column 1, and
-
- grep ^$ grepsmc.c
-
- matches only lines that are empty. The meta-character '.' matches any character
- except the end of line. Thus "^..........$" matches only lines containing
- exactly 10 characters (counting blanks and tabs), and "h..d" matches strings
- "head", "heed", "hold", "hard", etc. '*' matches 0 or more repetitions of the
- preceding character matched. Thus "a.*e.*i.*o.*u" matches any line containing
- the five vowels in alphabetical order. '+' matches 1 or more repetitions of
- the preceding character matched. Thus "a.+e.+i.+o+.u" matches lines containing
- the five vowels in order separated by at least one character. A bracket pair
- "[ ... ]" matches any of the symbols between the brackets. Thus "[bc]a[nt]"
- matches any line containing "ban", "bat", "can", or "cat". '-' can be used
- within brackets to indicate a range of characters. Thus "[A-Za-z]" matches
- any upper or lower case letter, "[A-Za-z][A-Za-z0-9]*" matches any string
- starting with a letter and continuing with one or more letters or digits,
- e.g., any C identifier. The meta-character '\' is used as a quoting character
- or escape character to specify symbols that otherwise have special meaning
- such as '[' or '*'. Thus "\[ *[0-9]+ *\]" matches any single C subscript
- that is specified numerically, possibly surrounded with blanks (e.g., "[ 33]",
- "[ 4 ]", or "[7]", but not "[ i ]").
-
- There is a problem in CP/M (TM) in that lower case letters in the
- command line are translated to upper case and blanks delimit arguments. Hence
- in this version I have adopted the convention that any letter in a regular
- expression is to be considered to be lower case unless it is immediately
- preceded by '\' when it is always considered uppercase. Moreover blanks and
- tabs are coded as '_' and '`', respectively. Thus both "[\tt]he" and
- "[\TT]HE" match lines containing either "The" or "the". And to locate all
- blank lines (i.e., all lines that are either empty or contain only blanks or
- tabs) one could use the pattern "^[_`]*$" . To match actual '_' or '`',
- use '\_' or '\~'. Note that some of the examples in the preceding paragraph
- need to be modified to work with the CP/M version. Thus to match an
- identifier use "[\a-\za-z][\a-\za-z0-9]+" or "[\A-\ZA-Z][\A-\ZA-Z0-9]+".
-
- Usage of grep
-
- The general form of an invocation of grep is as follows ([ ... ]
- signifies an optional component):
-
- grep [ -Flags ] RegularExpression FileList [ > OutputFile ]
- or
- grep [ -Flags ] RegularExpression [ < InputFile ] [ > OutputFile ]
-
- where Flags is a sequence of letters from 'ncfhv', RegularExpression is a
- pattern as described above, FileList is a list of files to be scanned, and
- OutputFile is the optional file on which the output will be put. In the
- second form, InputFile is a single file to be scanned. If both FileList and
- "> InputFile" are omitted, grep will expect its input from the keyboard,
- terminated by ^Z (CTRL Z). This is a useful way to experiment with what a
- particular pattern matches. If more than one file is scanned, the file name
- is printed with each line matched unless the f flag is used (see below).
-
- The meaning of the flags is as follows:
-
- n print line numbers of lines matched.
- f reverse default for printing file name, i.e., print name if
- only 1 file is scanned, omit if more than 1 is scanned.
- v print only lines that do not match.
- c print only the total number of lines matched (or not matched
- if v is specified).
- h print help information (some additional meta-characters are
- described)
-
- Other Files
-
- Also included are the files used to create GREP.COM (with the exception
- of the compiler itself). The version of the compiler I have produces assembler
- code suitable for ASM. To avoid having to reassemble the I/O and system-
- related functions, I have created HEX files IOLBCALL.HEX and LIBASM.HEX,
- together with header files, STDIOCB.H and LIBASM.H that provide EQU's for the
- entries in the HEX files. IOLBCALL.HEX contains most of the standard C I/O
- functions (getc, getchar, fgets, putc, fopen, fclose, etc.; see STDIOCB.H for
- others included), as well as the run-time routines needed by the compiled
- code. LIBASM.HEX contains printf and fprintf (recognize %c, %s, %d, %x) and
- supporting routines. They were compiled from a library copyrighted by Jim
- Hendrix, modified so as to provide fprintf. Also included is CATLOAD.COM,
- a program allowing creation of a COM file from several HEX files. Usage is
-
- catload file1.hex [file2.hex ... ] comfile.com
-
- Catload can produce a COM file up to about 30K. With SMC.COM (the compiler),
- CATLOAD.COM, STDIOCB.H, LIBASM.H, IOLBCALL.HEX, and LIBASM.HEX on drive a:,
- the sequence I used to create GREP.COM was
-
- A>SMC B:GREPSMC.C > B:GREPSMC.ASM
- A>ASM GREPSMC.BBZ
- A>CATLOAD IOLBCALL.HEX LIBASM.HEX B:GREPSMC.HEX B:GREP.COM
-
- Christopher Bingham
- 792 Osceola Avenue
- St. Paul, MN 55105
- July 25, 1986.